multi-fidelity data
- Asia > China (0.28)
- North America > United States (0.14)
- Europe > United Kingdom > England (0.14)
- Information Technology > Information Management (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Data Science (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Multi-task parallelism for robust pre-training of graph foundation models on multi-source, multi-fidelity atomistic modeling data
Pasini, Massimiliano Lupo, Choi, Jong Youl, Zhang, Pei, Mehta, Kshitij, Weaver, Rylie, Aji, Ashwin M., Schulz, Karl W., Polo, Jorda, Balaprakash, Prasanna
Graph foundation models using graph neural networks promise sustainable, efficient atomistic modeling. To tackle challenges of processing multi-source, multi-fidelity data during pre-training, recent studies employ multi-task learning, in which shared message passing layers initially process input atomistic structures regardless of source, then route them to multiple decoding heads that predict data-specific outputs. This approach stabilizes pre-training and enhances a model's transferability to unexplored chemical regions. Preliminary results on approximately four million structures are encouraging, yet questions remain about generaliz-ability to larger, more diverse datasets and scalability on supercomputers. We propose a multi-task parallelism method that distributes each head across computing resources with GPU acceleration. Implemented in the open-source HydraGNN architecture, our method was trained on over 24 million structures from five datasets and tested on the Perlmut-ter, Aurora, and Frontier supercomputers, demonstrating efficient scaling on all three highly heterogeneous super-computing architectures. Keywords: Graph Neural Networks Distributed Data Parallelism Model Parallelism Multi-Fidelity Data Atomistic Modeling.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Energy (0.68)
- Government > Regional Government (0.46)
- Information Technology (0.46)
GAR: Generalized Autoregression for Multi-Fidelity Fusion Yuxin Wang
In many scientific research and engineering applications where repeated simulations of complex systems are conducted, a surrogate is commonly adopted to quickly estimate the whole system. To reduce the expensive cost of generating training examples, it has become a promising approach to combine the results of low-fidelity (fast but inaccurate) and high-fidelity (slow but accurate) simulations. Despite the fast developments of multi-fidelity fusion techniques, most existing methods require particular data structures and do not scale well to high-dimensional output. To resolve these issues, we generalize the classic autoregression (AR), which is wildly used due to its simplicity, robustness, accuracy, and tractability, and propose generalized autoregression (GAR) using tensor formulation and latent features. GAR can deal with arbitrary dimensional outputs and arbitrary multifidelity data structure to satisfy the demand of multi-fidelity fusion for complex problems; it admits a fully tractable likelihood and posterior requiring no approximate inference and scales well to high-dimensional problems.
- Asia > China (0.28)
- North America > United States (0.14)
- Europe > United Kingdom > England (0.14)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Graph Laplacian-based Bayesian Multi-fidelity Modeling
Pinti, Orazio, Budd, Jeremy M., Hoffmann, Franca, Oberai, Assad A.
We present a novel probabilistic approach for generating multi-fidelity data while accounting for errors inherent in both low- and high-fidelity data. In this approach a graph Laplacian constructed from the low-fidelity data is used to define a multivariate Gaussian prior density for the coordinates of the true data points. In addition, few high-fidelity data points are used to construct a conjugate likelihood term. Thereafter, Bayes rule is applied to derive an explicit expression for the posterior density which is also multivariate Gaussian. The maximum \textit{a posteriori} (MAP) estimate of this density is selected to be the optimal multi-fidelity estimate. It is shown that the MAP estimate and the covariance of the posterior density can be determined through the solution of linear systems of equations. Thereafter, two methods, one based on spectral truncation and another based on a low-rank approximation, are developed to solve these equations efficiently. The multi-fidelity approach is tested on a variety of problems in solid and fluid mechanics with data that represents vectors of quantities of interest and discretized spatial fields in one and two dimensions. The results demonstrate that by utilizing a small fraction of high-fidelity data, the multi-fidelity approach can significantly improve the accuracy of a large collection of low-fidelity data points.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > Los Angeles County > Pasadena (0.14)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)